PCFA: mining of projected clusters in high dimensional data using modified FCM algorithm
نویسندگان
چکیده
Data deals with the specific problem of partitioning a group of objects into a fixed number of subsets, so that the similarity of the objects in each subset is increased and the similarity across subsets is reduced. Several algorithms have been proposed in the literature for clustering, where k-means clustering and Fuzzy C-Means (FCM) clustering are the two popular algorithms for partitioning the numerical data into groups. But, due to the drawbacks of both categories of algorithms, recent researches have paid more attention on modifying the clustering algorithms. In this paper, we have made an extensive analysis on modifying the FCM clustering algorithm to overcome the difficulties possessed by the k-means and FCM algorithms over high dimensional data. According to, we have proposed an algorithm, called Projected Clustering based on FCM Algorithm (PCFA). Here, we have utilized the standard FCM clustering algorithm for sub-clustering high dimensional data into reference centroids. The matrix containing the reference values is then fed as an input to the modified FCM algorithm. Finally, experimentation is carried out on the very large dimensional datasets obtained from the benchmarks data repositories and the performance of the PCFA algorithm is evaluated with the help of clustering accuracy, memory usage and the computation time. The evaluation results showed that, the PCFA algorithm shows approximately 20% improvement in the execution time and 50% improvement in memory usage over the PCKA algorithm.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملComparative Analysis of Fuzzy C- Mean and Modified Fuzzy Possibilistic C -Mean Algorithms in Data Mining
Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. Clustering is a primary data description method in data mining which group’s most similar data. The data clustering is an important problem in a wide variety of fields. Including data mining, pattern recognition, and bioinformatics. It aims to organize a collection of data items into...
متن کاملComparative Investigations and Performance Analysis of Fcm and Mfpcm Algorithms on Iris Data
Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. Data mining is a computational intelligence discipline that contributes tools for data analysis, discovery of new knowledge, and autonomous decision making. Clustering is a primary data description method in data mining which group’s most similar data. The data clustering is an impor...
متن کاملFrequent-Pattern based Iterative Projected Clustering
Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining freq...
متن کاملRefining membership degrees obtained from fuzzy C-means by re-fuzzification
Fuzzy C-mean (FCM) is the most well-known and widely-used fuzzy clustering algorithm. However, one of the weaknesses of the FCM is the way it assigns membership degrees to data which is based on the distance to the cluster centers. Unfortunately, the membership degrees are determined without considering the shape and density of the clusters. In this paper, we propose an algorithm which takes th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. Arab J. Inf. Technol.
دوره 11 شماره
صفحات -
تاریخ انتشار 2014